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Motivation: Protein interactions are fundamental building blocks 
of biochemical reaction systems underlying cellular functions. The 
complexity and functionality of such systems emerge not from the 
protein interactions themselves but from the dependencies between 
these interactions. Therefore, a comprehensive approach for inte- 
grating and using information about such dependencies is required. 

Results: We present an approach for endowing protein networks 
with interaction dependencies using propositional logic, thereby 
obtaining protein hypernetworks. First we demonstrate how this 
framework straightforwardly improves the prediction of protein 
complexes. Next we show that modeling protein perturbations 
in hypernetworks, rather than in networks, allows to better infer 
the functional necessity of proteins for yeast. Furthermore, hyper- 
networks improve the prediction of synthetic lethal interactions in 
yeast, indicating their capability to capture high-order functional 
relations between proteins. 

Conclusion: Protein hypernetworks are a consistent formal frame- 
work for modeling dependencies between protein interactions within 
protein networks. First applications of protein hypernetworks on 
the yeast interactome indicate their value for inferring functional 
features of complex biochemical systems. 

Availability: Data and software is publicly available at 
http://www.rahmannlab.de/research/hypernetworks. 

Contact: Eli.Zamir@mpi-dortmund.mpg.dc, 
Sven.Rahmann@tu-dortmund.de 

1 Introduction 

A fundamental challenge in systems biology is understanding how 
cellular functions emerge from the collective action of interacting 
proteins. Ultimately such understanding could be achieved through 
a complete quantitative biochemical description of the system, in- 
cluding the concentrations and spatial distribution of all involved 
proteins and the kinetic constants of their interactions (Hughey 
et al, 2010; Kholodcnko, 2006). However, despite the progress 
in technologies for measuring these parameters in cells, completing 
such a description for large intracellular biochemical systems is still 
beyond reach. In a complementary front, high-throughput protein- 
protein interaction (PPI) detection techniques, including yeast two- 
hybrid and mass spectrometry (Walther and Mann, 2010; Parrish 
et al, 2006), can provide static snapshots of complete interactomes, 
as demonstrated with several model organisms. The obtained in- 
formation is typically modeled as networks - simple graphs with 
nodes and edges corresponding to the proteins and their interac- 
tions, respectively. However, such a data structure cannot represent 
information about how protein interactions depend on each other. 

A key mechanism generating interaction dependencies is allosteric 
regulation, in which a protein undergoes conformational change 
upon one interaction which affects its other interactions (Laskowski 
et al, 2009). Another common type of interaction dependencies 
is mutual exclusiveness arising from steric hindrance that prevents 
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proteins from binding simultaneously to too close or identical pro- 
tein domains. Protein interaction dependencies determine the prop- 
erties of biochemical systems, and therefore it is essential to compre- 
hensively consider them. Importantly, vast information about inter- 
action dependencies can be already obtained through database min- 
ing, and can be further expanded by high-throughput experimental 
approaches (see Discussion). However, a comprehensive approach 
to integrate this knowledge for getting a better understanding of 
large biochemical systems is still required. 

Recent studies indicate that considering mutual exclusiveness be- 
tween interactions improves the quality of protein complex predic- 
tion in yeast (Ozawa et al, 2010; Jung et al, 2010). Here, we further 
expand and generalize this potential by enabling on one hand the 
integration of diverse types of interaction dependencies and on the 
other hand the exploration of different aspects of the system. We 
use propositional logic to model interaction constraints, and provide 
a flexible framework for their system- wise representation, called pro- 
tein hypernetworks (Section 2). Next, we show how to mine hyper- 
networks for useful information, exemplified here as improving the 
quality of protein complex prediction (Section 3). Furthermore, our 
approach allows ranking the importance of each protein in a bio- 
chemical system based not only on its interactions but also on their 
dependencies. We demonstrate that such considerations help pre- 
dicting which proteins are essential for yeast viability (Section 4). 
Finally, we discuss how our approach syncrgizes with current ef- 
forts to obtain system-level understanding of complex biochemical 
systems. 

2 Modeling Approach 
2.1 Protein Hypernetworks 

A protein network is commonly described as an undirected graph 
(P, I) with a vertex p £ P for each protein and an undirected edge 
{p, p'} £ / for each possible interaction. We first develop an ap- 
proach for incorporating interaction dependencies into this descrip- 
tion, using propositional logic formulas. 

The propositional logic *Prop(Q) is the set of all propositional logic 
formulas over the propositions Q (the atomic units of the logic). 
This is the smallest set of formulas such that q itself is a formula 
for all q € Q and that is closed under the following operations: For 
<j>, <)>' £ s Prop(Q), all of -.</>, <f>A</>', <f>W </>' , and are in <Ptop(Q) 

as well. The operators -i,A,V,=> have the usual semantics "not", 
"and" , "or" , and "implies" , respectively. Note that the implication 
<p =></>' is equivalent to (-i</> V <f>'). As propositions Q, we use both 
proteins P and interactions I, so Q := P U J. A constraint is a 
formula with a particular structure over these propositions. 

Definition 1 (Constraint). A constraint is a propositional logic 
formula of the form q => ip with q £ P U I and tp £ <Prop(P U /). 
With £(P U /) C <ptop(P U I) we denote the set of all constraints. 

A constraint q => %p restricts the satisfiability of q by the satis- 
fiability of tp. In other words: if q is satisfied, then the same has 
to hold for %p. A constraint q => tp is equivalent to the disjunc- 
tion —<q V tp. We call the disjunct —<q the default or inactive case 
for the obvious reason that if q is not true, then tp does not need 
to be satisfied. For example (see Fig. la), the dependency of an 
interaction i on an allosteric effect due to a scaffold interaction j 
can be formulated by the constraint i => j. Mutual exclusiveness 
of two interactions i,j £ / can be modelled by the two constraints 
i => —<j and j => —<i. The usage of propositional logic allows also to 
define constraints of higher order: An interaction i could be either 



dependent on two scaffold interactions ji and j'2 or compete with 
an interaction jr'3, modeled by the constraint i ((ji A 32) V — 1J3). 

Now, we can define protein hypernetworks as a set of proteins 
(nodes) connected by interactions (edges) extended by a set of con- 
straints (dependencies between nodes or edges): 

Definition 2 (Protein Hypcrnetwork) . Let P and I be sets of pro- 
teins and interactions. Let C C <t(P U I) be a set of constraints 
that contains the default constraints i => p A p' for each interac- 
tion i = {p, p'} G /. Then the triple (P,I,C) is called a protein 
hypernctwork. 

Fig. la shows an example protein hypernetwork. While a protein 
hypernetwork is not a hypcrgraph (with hyperedges) in the classical 
sense, the name is appropriate because the constraints describe de- 
pendencies between the edges, which could be explicitly transformed 
into hyperedges, e.g., using minimal network states (cf. Sec. 2.2). 

2.2 Minimal Network States 

Following the incorporation of constraints in a protein hypernet- 
work, we now explain how to sum and propagate their effects in 
the system. The key idea is that it is sufficient to examine the 
implications for each protein or interaction q G P U / separately 
first, and then combine the information in a systematic way. We 
formalize this idea by defining sets of minimal network states. A 
minimal network state of q tells us which other proteins or inter- 
actions are necessary or impossible to occur simultaneously with q. 
For each q G P U /, we define a minimal network state formula, for 
which we then find certain satisfying models, which in turn define 
minimal network states. 

Definition 3 (Minimal network state formula). Let (P,I,C) be a 
protein hypernetwork. For q G P U /, the minimal network state 
formula of q is 

MNS (P j ;C) {q) := MNS(q) := q A f\ c. 

cec 

A solution for a propositional logic formula is captured by a satis- 
fying model or interpretation given by a map a : PUI — > {0, 1} that 
assigns a truth value to each proposition. A formula is satisfiable if 
any satisfying model exists. We assume that MNS(q) is satisfiable 
for all q 6 P U /, i.e., each single protein or interaction by itself is 
compatible with all constraints. 

For example, consider propositions Q = {qi,q2} and a formula 
4> = — iqi A (qi V</2). The only satisfying model is a : q\ >— > 0, c/2 >— > 1. 
In the protein hypernetworks framework, we interpret a model a as 
follows: A protein or interaction q is said to be possible in a iff 
ct(q) = 1. All possible proteins and interactions may (but need not) 
exist simultaneously (spatially and temporally) in the cell. 

There can be many satisfying models for MNS(q). Among these, 
we wish to enumerate all minimally constrained satisfying models 
(MCSMs). A suitable method for finding them is the tableau cal- 
culus for propositional logic (Smullyan, 1995). In a nutshell, the 
tableau algorithm decomposes a formula into its parts. It accu- 
mulates conjuncts, branches on disjuncts, and backtracks when a 
contradiction is encountered. More details are given in Sec. SI of 
the Supplement. For finding MCSMs, our custom implementation 
ensures that for each constraint q ip (i.e., disjunction -iq V ip), 
the default case -^q is explored first, and that ip is expanded only 
if the constraint is necessarily active, in order to avoid artificially 
constrained models. 

The general problem of deciding whether any given propositional 
logic formula <j> is satisfiable is NP-complete. However, MNS(q) 
has a special structure: it is a conjunction of a proposition and of 
(many) constraints. If all constraints are of a particularly simple 
structure, we can prove a linear running time; see Sec. SI in the 
Supplement. 

Each MCSM a defines a minimal network state, consisting of 
both necessary and impossible entities. The intuition is that the 
necessary entities k are simply the "true" ones (ct(k) = 1), and that 
the impossible entities are those that are explicitly forbidden by an 
active constraint. 

Definition 4 (Minimal Network State). Let (P,I,C) be a protein 
hypernetwork and q G P U /. Let a be a MCSM of MNS(q). We 



define sets of necessary and impossible proteins or interactions, 
respectively, as 

Nec a := {k G P U I | a(k) = 1}, 

Imp a := {k G P U I | 3 constraint (q' => ip) G C 

with ct(q') = 1 and ip Ak is unsatisfiable.}. 

The pair (Nec a , Imp a ) is called a minimal network state for q (be- 
longing to the MCSM a). 

For each proposition q, there can be several minimal network 
states. We write M q for the set of all minimal network states for q. 
We call M := M(pj^) := (J M q the set of all minimal network 
states for all proteins and interactions. 

Now we define a relation clashing, describing that two minimal 
network states cannot be combined without producing a conflict. 

Definition 5 (Clashing Minimal Network States). Two minimal 
network states (Nec, Imp) and (Nec', Imp') are clashing iff Nec n 
Imp' ^ or Imp n Nec' ^ 0. 

As we prove in Theorem 1, in order to know if two proteins or 
interactions arc simultaneously possible, it is sufficient to determine 
whether any pair of non-clashing minimal network states exists for 
them. 

Theorem 1. Let (P,I,C) be a protein hypernetwork. Let q,q' G 
P U I be two proteins or interactions, q ^ q' . Assume that there 
exists a non-clashing pair of minimal network states (m,m') G 
M q X M q i . Then q and q' are possible simultaneously, i.e., the 
following formula is satisfiable. 

£:= ( f\ c) AqAq'. 

cec 

Proof. Let rn = (Nec, Imp) G M q and ml = (Ned , Imp') G M q i be 
non-clashing; we show that £ is satisfiable by defining a satisfiying 
model a. Define True := NecU Nec' and False := Imp U Imp' . Since 
m and m! are not clashing, True D False = 0. Let a(r) := 1 for 
r G True, and a(r) := otherwise. We show that a satisfies all 
parts of £. 

The propositions q and q' in £ arc satisfied since q G Nec and 
q' G Nec' , so a(q) = a(q') = 1. 

For each c* = (r => ip) in the conjunction AceC c > there may 
appear two cases: a(r) = or a(r) = 1. If a(r) = 0, then c is 
satisfied regardless of the satisfaction of ip because of the implication 
semantics. If a(r) = 1, or equivalently r G True, then r G Nec or 
r G Nec' (or both). First, consider the case that r G Nec. By 
assumption, c* G C is then satisfied in A c gc c ^ Q- Additionally, it 
is not clashing with q' because True D False = 0. Therefore, it is 
also satisfied in £. The case r G Nec' is analogous. □ 

Minimal network states are the basis for further inferences on 
hypernetworks, as we demonstrate in Sections 3 and 4. First, how- 
ever, we show that perturbations can be easily incorporated into the 
framework. 

2.3 Inclusion of Perturbation Effects 

The protein hypernetwork framework allows to systematically com- 
pute consequences of perturbations. We distinguish between per- 
turbed and affected proteins or interactions: A perturbed one is 
the direct target of an experimental intervention which causes its 
complete removal from the system (e.g. by gene knock-down for pro- 
teins or point mutations for interactions), whereas an affected one 
is altered due to the propagation of the perturbation in the hyper- 
network. Assume that proteins PjCP and interactions I± C / are 
perturbed, and thus removed from the system. The problem at hand 
is to compute all affected proteins and interactions. This is done by 
recursively removing minimal network states m = (Nec, Imp) that 
necessitate a perturbed or affected entity (protein or interaction) 
q G Nec, while counting a protein or interaction as affected once it 
has no minimal network state left. Formally, we proceed as follows. 

Definition 6. Let M q be the set of all minimal network states for 
entity q (Definition 4), and let M C M(pj ^\ be any subset of all 
minimal network states. For a set of entities A C P U /, let 

M A ■= {(Nec, Imp) G M | A n Nec ^ 0} 
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Figure 1: (a) Principle of protein- hypernetwork construction. A protein network (nodes and black edges) is overlaid with two interaction 
constraints: mutual exclusive interactions (top arc arrow) and activating allosteric interactions (lower arc arrow). Propositional logic 
formulas for these interaction constraints and the molecular mechanism generating them are shown. The protein hypernetwork is the 
plain network together with all such propositional logic constraints. (b) Protein complex prediction in four steps (see Sec. 3.2 for 
algorithmic details): (1) Computation of minimal network states (Sec. 2.2), (2) prediction of initial protein complexes, (3) computation 
of simultaneously possible protein subnetworks, (4) refinement of the predicted complexes. 



be the set of minimal network states from M that become invalid 
when any entity in A is perturbed. Let R(A, M) := M \ Ma be 
the remaining set of minimal network states. Let Q(A, M) := {q S 
PUT" | M q C\R(A, M) = 0} be the set of entities for which no minimal 
network state is left. 

We recursively define a map p that maps a set of perturbed en- 
tities and a set of minimal network states to the set of affected 
entities. Let p : 2 PUI X 2 M ( p ,-f.c) _> 2 PUI be defined by 

f0 i/A = 0, 

P( ' ' '~ [A U p(Q(A,M),R(A,M)) otherwise. 

Let (P, /, C) be a protein hypernetwork with perturbations Pi C 
P and I^Q I and minimal network states M^p j q). Then 

Ql :=p(P;U^, M {P>IiC )) 

is the set all affected proteins and interactions. 

This provides a module that enables any algorithm that makes 
predictions based on protein networks to be applied also on a per- 
turbed network, considering the dependencies between interactions. 

3 Result I: Hypernetworks Improve Prediction of 
Protein Complexes 

3.1 Rationale 

When considering only the interactions between proteins, but not 
their dependencies, the prediction of protein complexes often relies 
on identifying dense regions in protein networks (Spirin and Mirny, 
2003; Bader and Hogue, 2003; Li et al, 2005). Indeed, algorithms 
for the prediction of complexes based on plain protein networks 
(P,I) like MCODE (Bader and Hogue, 2003) and LCMA (Li et al, 
2005) have been shown to provide reasonable results by detecting 
such dense regions. However, many of the complexes predicted in 
this way are false positives, since interaction dependencies do not 
allow their assembly. Along this line, it was recently shown that 
consideration of mutual cxclusivcness between interactions improve 
the quality of protein complex prediction (Jung et al, 2010). 

Here, we first provide a general framework that can build on an ar- 
bitrary network-based complex prediction method and ensures that 
the predicted complexes do not violate arbitrary given interaction 
constraints within the hypernetwork. Thus, the framework is much 
more general than the work by Jung et al. (2010); in practice, how- 
ever, the main problem is obtaining sufficiently many constraints 
(see Discussion). We demonstrate that our framework improves 
complex prediction on the yeast network in conjunction with the 
established constraints using the LCM algorithm (Li et al., 2005) as 
an example network-based complex prediction method. 



3.2 Algorithm 

The prediction of protein complexes in hypernetworks consists of 
four steps, illustrated in Fig. lb. First, for each protein and inter- 
action q £ P U the set of minimal network states M q is obtained. 
Then, with a network based complex prediction algorithm, an ini- 
tial set of protein complexes is predicted. Each complex c is given 
as a subnetwork (P c , I c ). 

The third step is more complicated. Let M c := U 9 gp ul 
be the set of minimal network states of the complex's entities. We 
want to combine the individual states without introducing clashes, 
as formalized by the following definition. 

Definition 7 (Maximal combination of minimal network states). 
For a complex c, a set M C M c is called a maximal combination 
of minimal network states iff (1) there exists no clashing pair of 
minimal network states in M, and (2) the inclusion of any further 
minimal network state from M c would result in a clashing pair. 

All maximal combinations of minimal network states for a given 
complex c can be obtained by recursively building a tree of minimal 
network states to be removed from M c . The root of the tree is an- 
notated with M c ; each other node is annotated with a remaining set 
M. If M does not contain any pair of clashing states, the node is a 
leaf, and M is added to the result set of maximal combinations. Oth- 
erwise, we take any m with clashing m' , m" , . . . and branch off two 
children which remove m on the one hand, and remove m' , m" , . . . 
on the other hand. The tree is explored in a depth-first manner, 
checking for redundancies in each node. Let Ai c Q 1 M " be the set 
of all found maximal combinations of minimal network states. Its 
cardinality equals the number of non-redundant leaves in the re- 
moval tree. For each maximal combination M £ M c , we generate 
the corresponding subnetwork of (P, I). 

Definition 8 (Simultaneous Protein Subnetwork). Let M 6 A4 C 
be a maximal combination of minimal network states for complex c. 
Let Pm be the set of all necessary proteins and Im the set of all 
necessary interactions in M, i.e., Pm '■= P D U(jVec 7mp)sM ^ ec 
and I M := I n U(Arec,/mp)eM Nec - Then ( p M,hl) is called a si- 
multaneous protein subnetwork. 

All proteins and interactions in (Pmj-Tm) may exist simultane- 
ously in the context of the protein hypernetwork (P, /, C) because 
the minimal network states in M do not clash with each other. In 
comparison to the subnetwork for the network based predicted com- 
plex (P c ,/ C )) each subnetwork (Pm>Im) may have lost and gained 
several interactions or proteins. 

Finally, in the fourth step, we perform a network based com- 
plex prediction on each simultaneous protein subnetwork (Pm, Pm) 
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Table 1: Application of interaction constraints improves the qual- 
ity of protein complex prediction. The procedure described in 
Sec. 3.2 was performed without constraints, with the 458 constraints 
reported in Jung et al. (2010), and with 100 independent samples of 
458 randomly generated constraints (cf. Supplement Sec. S3). (A) 
Results for the benchmark set of the 55 connected complexes out of 
the 267 annotated MIPS complexes. (B) Results for the benchmark 
set of the 62 connected complexes out of all 1142 MIPS complexes. 



(A) annotated MIPS complexes 


precision 


recall 


plain (no constraints) 


0.142 


0.792 


458 constraints 


0.206 


0.792 


458 random constr.; meaniSD 


0.149±0.005 


0.782±0.02 


(B) all MIPS complexes 


precision 


recall 


plain (no constraints) 


0.15 


0.76 


458 constraints 


0.21 


0.76 



again with the same algorithm as during the initial step (as pro- 
posed by Jung et al. (2010); thereby it has to be ensured that the 
network based complex prediction does not produce biased results 
when performed only on subnetworks). The proteins and interac- 
tions in the new complexes are simultaneously possible. However, 
the prediction may miss necessary interactions and proteins outside 
the initially predicted complex. Therefore, we force these omitted 
entities to be contained in the corresponding predicted complex. 

3.3 Experiments 

To evaluate the refined complex prediction, we use the Compre- 
hensive Yeast Genome Database (CYGD; Giildener et al. (2005)) 
description of the yeast S. cerevisiae interactome (last revision 01- 
10-2008, 4579 proteins connected by 12567 interactions), which is of- 
ten being used to benchmark protein complex prediction algorithms 
(Bader and Hogue, 2003; Li et al, 2005; Altaf-Ul-Amin et al, 2006; 
Jung et al, 2010; Feng et al, 2010). This choice ensures consistency 
with the used collection of interaction dependencies (Jung et al, 
2010), that was defined on top of CYGD. Additionally, CYGD con- 
tains a gold standard for complex prediction known as the MIPS 
dataset (1142 known complexes, last revision 18-05-2006), in the 
following referred to as CYGD complexes. From these, 267 com- 
plexes are annotated with their biological function and considered 
to be reliable (Li et al, 2005; Jung et al, 2010; Feng et al, 2010). 

We exemplarily use the local clique merging algorithm (LCMA, Li 
et al. (2005), see Supplement Sec. S2.1) as the underlying network- 
based protein complex prediction tool. Since LCMA cannot predict 
complexes containing less than three proteins or complexes that 
are not connected in the underlying protein network, we restrict the 
benchmark set of the 267 reliable complexes to the 55 connected ones 
of at least three proteins, and the benchmark set of all 1142 MIPS 
complexes similarly to 62 complexes. 

We created a protein hypernetwork from the yeast protein net- 
work by incorporating as constraints 458 pairs of mutually exclusive 
interactions reported by Jung et al. (2010). Together, these mutu- 
ally exclusive interactions constrain 329 interactions, which are 2.7% 
of all interactions (we refer to these as the 458 constraints from now, 
although each mutually exclusive pair of interactions i, j is modelled 
in fact by two constraints i => —ij and i => 

Our refined predictions are compared to the known CYGD com- 
plexes: Following literature conventions (Bader and Hogue, 2003; 
Li et al, 2005; Altaf-Ul-Amin et al, 2006; Jung et al, 2010) to 
ensure comparability, we consider a predicted complex c £ C(p,l,C) 
to match a CYGD complex c! £ C C ygd iff y/QcTi c'| 2 )/(|c| ■ \c'\) > 
V0?2 RJ 0.45. 

Let B := Ccygd be the benchmark set of CYGD complexes and 



'(P,I,C) 



be the set of predicted complexes. By FP C P we 



denote the set of false positives, that is predictions that were not 
found in the benchmark. By FN C B we denote the set of false 
negatives, that is the complexes in the benchmark that were not 
predicted. The recall and precision of a prediction are defined as: 




recall : 



\B\-\FN\ 
\B\ 



and precision : 



\P\ ~ \FP\ 
\P\ ' 



Figure 2: Minimal network state graph and PIS. BFS from node 
A and I results in a higher PIS than from node C, because of the 
competition between interactions AB and BG and the dependency 
of GH on HI, respectively. For the underlying hypernetwork see 
Figure 1. 

Table 1 shows that constraining only 2.7% of all known interac- 
tions already increases the precision while the recall remains con- 
stant. In contrast, applying 458 randomly generated constraints 
modelling mutually exclusive interactions (see Supplement Sec. S3 
for details) does not provide an improvement and even reduces the 
recall. Sec. S2.2 of the Supplement furthermore shows the develop- 
ment of recall and precision when the true constraints are gradually 
introduced in a random order. 

The prediction of protein complexes in hypernetworks uses 
network-based complex prediction as an autonomous module. 
Therefore, the choice of LCMA as the algorithm for this module 
here is arbitrary for demonstrating the benefit of using protein hy- 
pernetworks. Any LCMA-analogous algorithm can be plugged sim- 
ilarly into the hypernetwork approach to ensure that the complexes 
it predicts do not violate constraints, thereby improving the predic- 
tion quality. Indeed, Jung et al. (2010) show that this is also true 
for the MCODE complex prediction algorithm (Bader and Hogue, 
2003). Note that the hypernetwork formalism allows to further har- 
ness such algorithms to predict changes in protein complexes upon 
perturbations, as we described in Sec. 2.3. 

4 Result II: Hypernetworks Improve Prediction of 
Protein Functional Importance 

4.1 Perturbation Impact Score 

Given a plain protein network, the functional importance of each 
protein is often estimated based on its number of interactions or 
connectivity (Jeong et al, 2001). Hypernetworks with their con- 
straints additionally allow to take dependencies between the inter- 
actions into account. We thus propose the perturbation impact score 
(PIS) that indicates the amount of changes a perturbation induces 
in the possible states of a protein network. First, we define the min- 
imal network state graph that shows the influence of a perturbation 
(Fig- 2). 

Definition 9 (Minimal network state graph). Let (P,I,C) be a 
protein hypernetwork. Consider a graph Gmns '■= {P U I, E) with 
directed edges E defined as follows. For each protein or interaction 
q £ P U I , consider each minimal network state (Nec, Imp) £ M q 
and each entity q' £ Nec U Imp; then E consists of all such edges 
(q',q). The directed graph Gmns * s called minimal network state 
graph. 

An edge (q',q) in Gmns represents that the perturbation of q' 
affects the possible configurations of the network around q. On the 
one hand, if q' is necessary for q, then q will become impossible once 
q' is perturbed. On the other hand, if q' is mutually exclusive with q, 
the disappearance of q 1 will also have an effect on the configuration 
of the network around q. 

Now the PIS can be defined for a set of perturbed proteins or 
interactions. 

Definition 10 (Perturbation Impact Score). Let (P, I, C) be a pro- 
tein hypernetwork. Let C P U I be the set of perturbed pro- 
teins or interactions. Let be the set of nodes reachable from 
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Figure 3: Performance of P/S' 458 ' and PIS^ (connectivity) in predicting viability of perturbations, (a) Scattcrplot of connectivity 
against difference PIS minus connectivity; note that always PIS > connectivity, (b) Cumulative distribution function (cdf) of PJ£( 458 ) 
and PIS(°~) for viable and lethal/sick perturbations (axes span region of distinguishable values), (c) Differences between the cdfs (for 
details sec Supplement, Sec. S4.2) of viable and lethal/sick perturbations, for constraints, 458 constraints, and 458 random constraints. 
For the latter, 100 samples of 458 random constraints were drawn (see Supplement Sec. S3), and the figure shows the area between the 
mean plus minus one standard deviation. Note that the P/S^ 458 ) curve is always above the PIS^ curve, and that the PIS^ curve 
agrees well with the randomized curve, indicating the contribution of constraints to a better discrimination between viable and lethal/sick 
perturbations. 



in the minimal network state graph. Define a distance function 
distQ^ : R_i — > N such that distQ^ (q) is the shortest path length 
Gmns between q and any node in (this is well-defined because 
consideration is restricted to reachable nodes q). The perturbation 
impact score of the set is defined as 



pis (p ,i,c)(Qi) ■-- 



: distgjq). 



PlSip j ty be the 
PIS^p j Q) the score for the con- 



The PIS models the idea that a protein or interaction is likely to 
be more important the further its perturbation propagates through 
the network. Of course, distQ^ (q) can be computed by a standard 
breadth-first search (BFS) on Gmns beginning with the nodes in 

Qi- 

When computing the perturbation impact score PI S^p j c) ({p}) 
of a single protein p that does neither appear in a constraint itself 
nor has a neighbor that does, its score is equal to its connectiv- 
ity. Since connectivity was shown to correlate with the functional 
importance of proteins (Jcong et al, 2001), the incorporation of 
constraints in the PIS can be expected to further enhance this pre- 
diction quality. As will be shown, PIS allows also to measure the 
impact of combination of perturbations, thereby to infer functional 
relations between proteins such as synthetic lethality. 

4.2 Experiments 

We evaluated PIS against plain connectivity as estimators for the 
functional importance of proteins. Recalling that PIS without con- 
straints equals connectivity, we let P/5(°) 
connectivity, and P/S 1 ' 458 ' 

strained CYGD-based yeast protein hypernetwork (P, /, C) as in 
Sec. 3.3. By definition, P/S' 458 ) is always equal to or higher than 
connectivity (Fig. 3a). 

We assume that perturbation of functionally important proteins 
is more likely to produce sickness or cell death. Accordingly, for 
benchmarking, we classified perturbations as lethal/sick and vi- 
able according to the Saccharomyces Genome Database of null mu- 
tant phenotypes (1-4-2011, Cherry et al. (1998); see Supplement, 
Sec. S4.1, for details). From the distribution of PIS for both of 
these classes of perturbations (Fig. 3b) it is apparent that proteins 
resulting lethal/sick null mutants tend to have a higher PIS, regard- 
less of the consideration of constraints. 

Therefore, we investigated more closely the increase in PIS caused 
by the interaction constraints. While 7% of the lethal or sick pertur- 
bations exhibit an increased PIS upon consideration of constraints, 
only 2% of the viable ones do. To measure the separation between 
the classes we subtract the cdf of lethal/sick from that for viable. 
The higher this difference, the better the separation (for details see 
Supplement, Sec. S4.2). Fig. 3c shows this measure for 0, 458 and as 
a mean for 100 samples of 458 random constraints. The application 
of 458 random constraints does not alter the difference compared to 
constraints. In contrast, the application of the true 458 constraints 
increases the difference - most obvious for a PIS between 20 and 
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Figure 4: Quality of the prediction of lethal/sick and viable per- 
turbations as a function of the threshold used for the classification, 
for PIS*- '', P/S( 458 ) and 1000 samples of 458 random constraints 
(see Supplement Sec. S3), x-axis: threshold given as percentage 
of perturbations above a certain PIS or connectivity value; y-axis: 
combined ratio of true positives for lethal/sick and viable perturba- 
tions (TPlethal/-Plcthal + ^viable /Pliable) /2. 



100 - and hence improves the discrimination between lethal/sick 
and viable perturbations. 

In general, PIS and plain connectivity agree for most proteins 
(overlaid points along the horizontal axis in Fig 3a), because only a 
few constraints are available so far. However, for several proteins the 
difference is striking. For example the yeast protein SME1, which 
is required for mRNA splicing and whose perturbation is lethal to 
the cell (Giildener et al., 2005), has only 6 binding partners in the 
CYGD - a relatively low connectivity that is not correlated with its 
biological importance. With the application of constraints, the PIS 
of SME1 increases to 111 and therefore correctly suggests that per- 
turbation of this protein would be lethal. Counterexamples, where 
the introduction of constraints wrongly increases the PIS of a viable 
protein, also exist, however they are a minority, as indicatted by the 
fact that constraints improve the overall performance of PIS (e.g. 
Fig. 3c). 

To illustrate the use of PIS to predict the functional importance 
of a protein, we predicted lethal/sick and viable perturbations by 
systematically applying a threshold t to the PIS. If a protein had 
a PIS of at least t we predicted its perturbation to be lethal/sick, 
while we predicted it to be viable for a PIS less than t. Fig. 4 
shows the prediction quality for different thresholds, when using 
or 458 constraints or 1000 samples of 458 random constraints. To 
ensure the comparability of thresholds t, they are expressed as the 
percentage of proteins reaching or exceeding the PIS or connectivity 
threshold (e.g., t = 50 means a certain value of PIS or connectivity 
such that half of the proteins reach or exceed it). It can be seen 
that every non-trivial threshold performs better than the trivial 
ones that predict no (t = 0) or all (t = 100) perturbations to be 
lethal/sick. Further, the application of the true constraints provides 
an improved prediction (especially below t = 20) in comparison to 
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Figure 5: PIS is an indicator for lethal/sick synthetic perturba- 
tions, (a) cumulative distributions functions (cdfs) of PIS for syn- 
thetic perturbations with and 458 constraints, distinguished be- 
tween lethal/sick (as defined by Tong et al. (2004)) and the rest 
of synthetic perturbations, (b) Differences of cdfs (cf. Supplement, 
Sec. S4.2) of viable and lethal/sick perturbations, for and 458 
constraints and 100 samples of 458 random constraints (cf. Supple- 
ment, Sec. S3). Note that for PIS > 50 the P/S^ 458 ' curve lies 
above the PJS(°) curve, indicating the contribution of constraints 
to a better discrimination between viable and lethal/sick synthetic 
perturbations. 



random constraints (Fig. 4). 

Many null mutations do not affect viability when occurring alone, 
but become lethal when occurring together with another specific 
null mutation (i.e. synthetic lethality), indicating functional buffer- 
ing and relation between the corresponding proteins (Tong et al, 
2001, 2004). To evaluate if PIS can capture these pair-wise protein 
relations, we investigated its ability to predict synthetic perturba- 
tions by calculating PIS^p j ({p, p'}) for every pair of proteins 
p,p' S P. Note that without constraints, PIS here equals count- 
ing the union of neighbors of two proteins. Fig. 5 shows that PIS 
provides a striking separation between lethal/sick and viable pertur- 
bations (as defined by Tong et al. (2004)). Further, the application 
of constraints induces a shift of the scores toward higher values, that 
results in an improved discrimination between viable and lethal/sick 
synthetic perturbations (Fig. 5b). In contrast, using 100 samples of 
458 random constraints again docs not provide an improvement, 
and even decreases the discrimination quality. 

We conclude that PIS is an improvement over connectivity as a 
predictive measure for functional importance, as it allows to inte- 
grate interaction constraints from hyp ernet works. Since only 2.7% 
of interactions are constrained in our experiments, improvements 
by constraints are naturally small here. We expect the capability 
of PIS to discriminate between lethal/sick and viable perturbations 
to further increase as information about additional interaction con- 
straints will become available. 



5 Discussion 

The dependencies of protein interactions encode the capability of 
PPI systems to process information and execute cellular decisions. 
We developed an approach to unfold this dimension of informa- 
tion by incorporating interaction constraints generated by allosteric 
regulations and competative binding. On the level of individual 
proteins, competition between interactions on the same binding do- 
main leads to their complete mutual exclusivcness. Similarly, al- 
losteric regulations typically generate all-or-none switches between 
a non-binding and a binding state (Laskowski et al, 2009). There- 
fore, propositional logic can capture perfectly these fundamental 
processes, and additionally facilitates their algorithmic integration. 
Similarly, protein hypernctworks can incorporate regulations of pro- 
tein interactions by post-translational modifications (e.g. phospho- 
rylation), as these are often on/off switches dcscribable by proposi- 
tional logic (e.g. {A, B} => {B, PO4} states that B has to be phos- 
phorylated to allow its interaction with A). The temporal expression 
and spatial distribution of intracellular proteins, which were shown 
to be valuable dimensions of information (Han et al., 2004; Walthcr 
and Mann, 2010), can also be incorporated into the hypernetwork 
framework by discrctizing them in time (e.g. cell-cycle phases or 
developmental stages) and space (e.g. by sub-cellular compartment 
or by tissue). 

The question addressed in this work is how to use information 



about interaction dependencies, rather than how to collect it. Nev- 
ertheless, it should be noted that a significant amount of information 
about interaction dependencies can already be obtained through cu- 
ration from literature. Along this line, we are currently developing 
a text-mining tool to assist the identification of publications report- 
ing interaction dependencies. As for future publications, since auto- 
matic curation of protein interactions is facilitated by a structured 
text format (Leitner et al., 2010; Ceol et al, 2008a), our work moti- 
vates its usage to report interaction dependencies (see Supplemen- 
tary Sec. S5.1). Mutual exclusiveness between protein interactions 
can also be inferred from protein-domain-annotated intcractomc 
databases (Ooi et al, 2010) or in-silico docking modeling (Wass 
et al, 2011; Mosca et al, 2009). Finally, high-throughput quantifi- 
cation of protein interactions at domain resolution and methods for 
monitoring high-order interactions (Jain et al, 2011; Hruby et al, 
2011; Hcinzc et al, 2004) would provide comprehensive identifica- 
tion of interaction dependencies. 

Here, we illustrated that even constraining less than 3% of the in- 
teractions in the CYGD is sufficient to improve complex prediction, 
consistently with previous results (Jung et al, 2010). It is expected 
that the actual fraction of constrained interactions is much higher, 
to allow a dynamic and functional yeast interactome. These addi- 
tional interaction constraints would be due to allosteric regulations 
(Laskowski et al, 2009), generation and elimination of binding sites 
upon protein phosphorylation and dcphosphorylation (Seet et al, 
2006) and more cases of mutually exclusive interactions. 

We proposed a perturbation impact score that provides a measure 
for a protein's importance within a hypernetwork. We have shown 
that this measure provides improvements to the prediction of func- 
tionally important proteins compared to the investigation of plain 
connectivity due to the usage of interaction dependencies as con- 
straints. As more constraints get reported, the measure should help 
to rationally design perturbation experiments for network analysis 
(Zamir and Bastiacns, 2008) and provide mechanistic insights into 
large PPI systems. 

Our data and software for protein hypernetworks are available; 
please refer to the Supplement (Sec. S5) for implementation details. 
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Protein Hypernetworks: a Logic Framework for Interaction Dependencies and Perturbation Effects 

in Protein Networks (Supplementary Data) 



a Ih -iA A(AvB) 
I 

a Ih -A 
I 

olhAi alh£/ 

Figure SI: Tableau for the propositional logic formula <j> = -iA A 
(AVB). The path marked by £ docs not lead to a satisfying model a, 
because it contains a contradiction between the assumptions a II — A 
and a Ih A. The path marked by / is free of contradictions, hence 
its generated model satisfies <j>. 

This supplement contains background reference material on the 
Tableau Algorithm (Sec. SI), additional material on protein com- 
plex prediction with hypernetworks (Sec. S2), details on the genera- 
tion of random constraints for null models (Sec. S3), supplementary 
material on the prediction of protein functional importance with the 
PIS defined in the main aricle (Sec. S4). We also provide software 
implementation details (Sec. S5), including resource consumption 
and details on the representation of constraints. 

SI Background: Tableau Algorithm 

A suitable method for finding satisfying models is the tableau cal- 
culus for propositional logic (Smullyan, 1995): For an input formula 
<j>, it generates a deductive tree (the tableau) of assumptions about 
0. Each assumption a in the tree can be made due to an assumption 
a 1 in an ancestral node. We say that a' results in a, and the gen- 
eration of a out of a 1 is called expansion of a' . The propositional 
logic tableau algorithm generates satisfying models a for <f>. We 
write a Ih ip if a satisfies a subformula ip. The tableau algorithm 
now generates assumptions of the type a Ih ip with ip being a sub- 
formula of the input formula. That is, a conjunction a Ih ip\ A -02 is 
expanded into a Ih ipi and a Ih ip2 on the same path, and a disjunc- 
tion a Ih ipi V ip2 results in branching into a Ih ipi and a Ih ip2 (see 
Fig. SI). 

Each path from the root to a leaf represents a model a. If a path 
does not contain any contradictory assumptions, the model satis- 
fies the input formula. Implementations of the tableau algorithm 
explore the tree in a depth-first way, and use backtracking once a 
contradiction occurs. 

Different variations of the tableau algorithm exist. For example, 
one may be interested only in the decision "does a satisfying model 
exist?", or the task could be to output an (arbitrary) satisfying 
model (if one exists), or to list all satisfying models. The latter 
is the task we face when enumerating minimal network states. In 
theory, the tableau algorithm exhibits an exponential worst case 
complexity, as it operates by complete enumeration of all cases with 
backtracking. 

However, elaborate backtracking strategies can significantly re- 
duce the running time in practice. Insights into such strategies 
and implementation details are provided by (Li, 2008). Also, faster 
heuristics exist, like GSAT (Sclman et al, 1992), but they are not 
adequate for the problem, as they do not guarantee a correct and 
complete answer. 

For our purpose, the implementation has to ensure that 
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1. for each constraint q => ip (i.e., disjunction —<qVip), the default 
case -iq is explored first, and that ip is expanded only if the 
constraint is necessarily active, in order to avoid artificially 
constrained models; 

2. all satisfying models that comply with 1. are enumerated in the 
process. 

In our application, the tableau algorithm can be expected to per- 
form acceptably, since we solve only minimal network state formu- 
las with a fixed structure (a conjunction of constraints) and expect 
most constraints to be of a simple form. In particular, we expect 
mostly mutual exclusive interactions, modeled by constraints of the 
form i => -ij, and scaffold dependent interactions that can be repre- 
sented by a constraint of the form i => j. To prove the performance 
of the tableau algorithm when all constraints are of this form, for a 
protein hypcrnetwork (P, /, C) we now show that it will need only 
0(| C|) expansions to find a satisfying model. Since expansions gen- 
erate the deductive tree, that also limits all tableau operations like 
backtracking or contradiction tests to be polynomial in \C\. 

Theorem SI. Let MNS/pj t c)(l) with q g P U / be the minimal 
network state formula for a protein hypernetwork (P, I , C) . Assume 
that each constraint in C is of the form c = (qi => £) with a literal 
£ 6 {q2, -1 <?2} ".nd qi, 92 S P U I. Then the tableau algorithm needs 
at most 0(\C\) expansions to find a satisfying model. 

Proof. We show that a constraint that is active cannot be rendered 
inactive again when assuming that the formula is satisfiable. As- 
sume that an active constraint c = (q± => £) is the cause of a conflict, 
hence I contradicts some literal £' . Since we require above that the 
tableau explores the inactive case first, we know that —<qi caused a 
contradiction, too. We now assume that £ is removed and we ex- 
pand c to —<qi again to resolve the contradiction. Then, the formula 
is found to be not satisfiable, because —<qi can either contradict q 
or another constraint, in which case the argument can be applied 
recursively. 

Now we show that each constraint is expanded at most two times. 
There are three cases: (1) The constraint is never activated; then 
only the inactive case is expanded and the constraint is expanded 
only once. (2) The constraint is activated immediately because -igi 
leads to a conflict. This needs two expansions. (3) The constraint 
is first inactive and then activated because of a backtracking. This 
needs again two expansions. Hence, the tableau algorithm needs to 
perform 1 + 2\C\ = 0(\C\) expansions. □ □ 

Note that MNS(pj : c)(q) with above simple constraints is essen- 
tially a Horn formula for which it is known that calculating a satis- 
fying model has polynomial complexity with specialized algorithms 
(Dowling, 1984). However, proving the complexity of the general 
tableau algorithm for this case remains useful: While we expect 
most of our constraints to have this simple form, we cannot be 
sure for all of them. Hence it is reasonable to provide a compu- 
tational approach that can handle full propositional logic, but will 
have comparable complexity to specialized horn formula algorithms 
in the majority of cases. 

S2 Complex Prediction in Hypernetworks 

In this section, we provide more details on protein complex predic- 
tion, supplementing Section 3.3 of the main article. 
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Figure S2: LCMA precision and recall for different merge simi- 
larly thresholds uj on the CYGD protein network and complexes, as 
decribed in the main article. 



52.1 Background: LCMA 

We briefly summarize the steps of the Local Clique Merging Algo- 
rithm (LCMA; Li et al. (2005)) as an exemplary complex prediction 
algorithm based on plain networks that can be improved by intro- 
ducing protein hypernetworks. In a first step, LCMA finds a set of 
local cliques; second, it iteratively merges those with a significant 
overlap (given by a merge similarity threshold uj). The complex pre- 
diction consists of all merged cliques once no further merges happen 
or average density falls below 95% of the previous iteration. While 
the authors propose uj = 0, we found LCMA to perform better with 
higher thresholds on the plain CYGD (Giildener et al., 2005) net- 
work and complexes (Fig. S2). Now, a higher threshold u) means 
that less clique merging is performed. The best choice of u) = 1.0 
means that the clique merging step is not performed at all because 
only cliques that overlap by 100% (i.e. that are identical), would be 
merged. Therefore, we chose u> = 0.4 heuristically as a compromise 
between prediction quality and originally intended behaviour. 

52.2 Effects of Gradually Introducing Constraints on 
Complex Prediction in Hypernetworks 

In the main article (Sec. 3.3), we showed that applying all 458 avail- 
able constraints from Jung et al. (2010) resulted in an improved pre- 
cision, while leaving the recall constant when predicting the CYGD 
complexes. Here we investigate the effect of a gradual application of 
constraints, in order to get an insight on their actual effects. There- 
fore, we randomly sampled subsets of all 458 available constraints of 
sizes between 4 (1% of 458) and 453 (99% of 458). More precisely, 
for each i S {1,2. ..,99}, we generated 50 independent samples of 
size i% of 458 (rounded to the nearest integer). 

Precision and recall as a function of the number of applied con- 
straints. Fig. S3 shows the development of precision and recall as 
a function of the number of applied constraints. While the recall is 
independent of the number of applied constraints (Fig. S3a), high 
numbers of constraints (> 100) consistently provided an improve- 
ment in the precision over the unconstrained instance with precision 
0.15 (Fig. S3b). This indicates that constraining only about 1% of 
the interactions is already sufficient to robustly improve complex 
prediction. The maximum achieved precision then decreases gradu- 
ally when applying more than 100 constraints and appears to reach a 
plateau upon using all available constraints. The minimum achieved 
precision rarely drops below the final precision value 0.20. 

We offer the following explanation: Note that initially both the 
number of predicted complexes and the number of false positive 
predictions increase (Figs. S3c and S3d), but the latter one at a 
slower rate. Upon application of more interaction constraints both 
quantities reach a plateau and eventually decrease. How might this 
come about? A false positive complex may contain two interactions 
that are in reality mutually exclusive. The corresponding constraint 
might not be sampled when applying few constraints, resulting in 
one false positive prediction. When the number of constraints in- 
creases, the refinement step leads to two simultaneous protein sub- 
networks, on which again nearly the original complex without one 
of the exclusive interactions is predicted. Each of the two complexes 
may now be closer to a true benchmark complex, but the number 
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Figure S4: Complex prediction precision when removing all false 
positive complexes that contain at most 23 proteins. The red and 
green dashed lines mark the values obtained when none or all of the 
constraints are applied, respectively. The yellow line indicates the 
mean value for each number of applied constraints. 
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Figure S5: Matching accuracy for each CYGD complex over 
gradual application of constraints; mean values over 50 indepen- 
dent samples. Complexes with constant accuracy and complex 
Nop58/Nop56/Nopl are not shown. 



of constraints may still be too low to turn it into a true positive. 
Thus refinement of one false positive complex might initially lead to 
two or more false positive smaller complexes. This may underlie the 
observation that after an initial increase of the precision, showing 
a general beneficial effect of constraints, there is a stationary phase 
with even slightly decreasing precision. 

Since the available constraints affect only less than 3% of all inter- 
actions, an important challenge is to extrapolate the development of 
the precision as a function of much higher numbers of constraints. 
It is rational to hypothesize that the precision will eventually start 
to increase upon constraining more interactions. Since we cannot 
prove this hypothesis directly at this point, we instead mimicked the 
effect of adding further constraints that may destroy small false pos- 
itive complexes (< 24 proteins) by artificially removing them from 
the prediction. After an initial decrease this leads to a precision 
increase when applying more than 100 constraints (Fig. S4). This 
observation is consistent with our hypothesis that the application 
of further constraints should lead to an increase of precision on all 
complexes. 

Accuracy on single complexes. Complementary to the precision 
and recall of the whole prediction, we examined single complexes 
as well. As in Sec. 3.3 of the main article, we consider a predicted 
complex c to match a CYGD complex c' iff the matching accuracy 
VOn c'| 2 )/(|c| ■ |c'|) exceeds the threshold of Vo?2. 

When monitoring the accuracy of each CYGD complex c' while 
gradually introducing constraints, we would expect that the accu- 
racy with its best matching prediction c remains constant or in- 
creases. Indeed, while the accuracy remains constant for most of 
the 55 complexes, it increases for four complexes, but there are 
also two complexes whose accuracy does not follow this expectaton 
(Fig. S5; one of the latter ones not plotted). 

The Nop58/Nop56/Nopl complex (CYGD ID 440.12.30), one of 
the two complexes with decreasing accuracy, is not shown in Fig. S5 
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Figure S3: Complex prediction quality as a function of the number of applied constraints from 50 random samples for each step: (a) 
recall, (b) precision, (c) total number of predicted complexes, (d) false positive predictions. The red and green dashed lines mark the 
values obtained when none or all of the constraints are applied, respectively. The yellow line indicates the mean value for each number of 
applied constraints. 



because it contains one constraint (Nop56 and Nop58 are competing 
on the same binding domain of Nopl) so that it disappears once this 
constraint is applied. 

The Gim3/Gim5 /Gim4 /PAC 10 / YKE2 complex (CYGD ID 177, 
cyan in Figure S5) accuracy first increases then decreases, approx- 
imately returning to the initial value in the end. We identified the 
following constraints to hurt its accuracy: 



{SMC3,SMC3} = 
{ARP6,SWD3} = 
{SKP1,CDC53} 



{SMC1, SMC3} 
■ {CLA4, SWD3} 
n {MET30, CDC53}. 



These findings do not necessarily imply that those constraints are 
wrong. Rather they are hindering the heuristic LCMA in predict- 
ing the two complexes by altering the density of the corresponding 
regions in the simultaneous protein subnetwork. This shows that 
predicting complexes by density — while it seems a good strategy in 
general - does indeed fail for single cases. 

53 Generation of Random Constraints 

It is important to compare the effects of (presumably) true known 
constraints with the effect of random constraints in order to show 
that observed effects are not simply due to applying constraints per 
se. Here, we specify how we generate random constraints of the 
type "mutually exclusive interaction" . 

To generate a random constraint, we randomly choose a pro- 
tein pi £ P network, and randomly select two different neighbours 
P2 1 P3 S P. We interpret pi as the host protein and p2 , P3 as two pro- 
teins competing on the same binding domain of pi . Thereby we ob- 
tain the constraints {pi,P2j => -, {pi>P3} and {pi,P3} => ~'{pi,P2}- 

For the CYGD hypernetwork (Sec. 3.3 of the paper), we itera- 
tively generate 458 of these constraint pairs (we refer to such a pair 
simply as one constraint). By independently repeating this process 
n times, we gain n independent samples of 458 random constraints. 

54 Protein Functional Importance Prediction 
with Hypernetworks 

In this section, we provide more details on protein complex predic- 
tion, supplementing Section 4 of the main article. 



Table SI: SGD phenotypes selected to be classified as lethal/sick. 
A phenotype is composed by an observable and a qualifier. 



observable 

cell death 
apoptosis 
autolysis 
cell lysis 
necrotic cell death 
competetive fitness 
viability 
vegetative growth 
inviable 



qualifier 

increased, 
increased, 
increased, 
increased, 
increased, 
decreased 
decreased 
decreased 



increased rate 
increased rate 
increased rate 
increased rate 
increased rate 
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Figure S6: Example for cdf based separation analysis of two com- 
pletely separated distributions: uniform distribution Ui on [0,40] 
and uniform distribution U2 on [60,100]. (a) cumulative distribu- 
tion functions (cdfs) for Ui (dashed) and U2 (solid), x-axis: value 
range of the distributions. E.g. at x = 50, the cdf of Ui has reached 
1, while that of U2 is still zero, (b) absolute difference of the two 
cdfs. 



phenotype "viable" and are not contained in the class of lethal/sick 
ones. 



S4.1 Phenotypes of Null Mutants in the 
Saccharomyces Genome Database 

We used the Saccharomyces Genome Database (SGD) (Cherry 
et al., 1998) to classify perturbations as lethal/sick. SGD collects 
the generated phenotypes of perturbation experiments for most pro- 
teins that are also in the CYGD, our selected benchmark. SGD phe- 
notypes are provided in a standardized way, a complete list is pro- 
vided by the Ontology Lookup Service (Cote et al., 2006). To fit our 
modelling of perturbation and the notion of functional importance 
in the main article, we considered only "null mutant" perturbations 
rather than e.g. overexpression experiments. Table SI shows those 
phenotypes that were counted as lethal/sick. In contrast, the class 
of viable perturbations contains all that are annotated with the 



S4.2 Analysis of cumulative PIS distributions 

To find out about the capability of PIS to indicate functional im- 
portant proteins, we analysed its distribution for disjoint classes like 
lethal/sick and viable. Therefore, we calculated empirical cumula- 
tive distribution functions (cdfs; normed cumulative histograms) for 
both classes. Maximum separation is provided if the "lethal/sick" 
cdf does not increase over before the "viable" histrogram reaches 
1 (Fig. S6a). No separation means that both cdfs have the same 
values for each score. To better compare several cdf pairs, we cal- 
culate the absolute difference between the two cdfs (Fig. S6b). The 
higher the absolute difference, the better is the separation. If the 
absolute difference reaches 1.0 at any point, the distributions are 
completely separated. 



10 



Table S2: Representation of possible interaction constraints in structured text. Keywords and proteins are annotated with a defined 
ontology term id. This way, finding and parsing constraints is improved while human readability is maintained. 

constraint structured text 



mutually exclusive interactions 
negative allosteric regulation by protein binding 



positive allosteric regulation by protein binding 



negative regulation by phosphorylation 
positive regulation by phosphorylation 



x (uniprotkbix) competes (ML0941) with y (uniprotkbiy) for interaction (ML0407) 
with z (uniprotkb:z). 

interaction (ML0407) between x (uniprotkbix) and y (uniprotkbiy) allosterically 
(SBO:0000239) inhibits (SBO:0000407) the interaction (ML0407) of y (uniprotkb:y) 
with z (uniprotkb:z). 

interaction (ML0407) between x (uniprotkbix) and y (uniprotkbiy) allosterically 
(SBO:0000239) activates (SBO:0000461) the interaction (ML0407) of y (uniprotkb:y) 
with z (uniprotkb: z). 

interaction (ML0407) between x (uniprotkbix) and y (uniprotkb:y) is inhibited 
(SBO:0000407) if x (uniprotkb:x) is phosphorylated (GO:0016310) on residue i. 
interaction (MI:0407) between x (uniprotkb:x) and y (uniprotkbiy) is activated 
(SBO:0000461) if x (uniprotkb:x) is phosphorylated (GO:0016310) on residue i. 



S5 Software Implementation 

We implemented protein hypernetworks as a JAVA™ based soft- 
ware suite. The suite consists of ProteinHypernetworkEditor 
that allows the definition and editing of protein hypernetworks, 
and ProteinHypernetwork that implements the prediction meth- 
ods presented in the main article. Further, both tools provide 
a graphical user interface and extensive visualization and im- 
port/export capabilities. The software suite can be obtained at 
http://www.rahmannlab.de/research/hypernetworks. 

55.1 Representation of Constraints 

An important challenge is the definition of a widely accepted format 
for the interchange of interaction dependencies or constraints. While 
SBML (Hucka et al, 2003) is suitable in principle, it provides a bio- 
chemical view of interactions and therefore contains overhead that 
is unnecessary for the definition of a protein hypernetwork. Instead 
we propose a two level approach for the interchange of constraints. 

Level 1: Structured Text. Ceol et al. (2008b) proposed a machine 
readable structured abstract that should be published along with 
papers on protein interactions to allow automated curation (see also 
(Lcitncr et al., 2010)). The format combines human-readable sen- 
tences with machine readable annotation, and is already capable 
of representing the expected types of constraints. For example, a 
pair of mutual exclusive interactions (as reported by Jung et al. 
(2010)) can be represented as follows: "ARC40 (uniprotkb:P38328) 
is a competitor (ML0941) of BEM2 (uniprotkb:P39960) for interac- 
tion (MhOSn) with CLA4 (uniprotkb:P48562) ". Table S2 provides 
a generalized representation for the major types of interaction con- 
straints. 

Level 2: HypernetworkML. With the hypernetwork markup lan- 
guage (HypcrnctworkML, Koster (2011)) we provide an XML-based 
file format (Bray et al., 1998) that is more suitable for possible large- 
scale studies providing many constraints at once and for permanent 
storage of the data. HypernetworkML is a combination of two es- 
tablished XML based formats: Interactions and proteins are repre- 
sented as a graph using GraphML (Brandes et al., 2002) whereas 
embedded MathML (Carlisle et al., 2003) is used for a prepositional 
logic definition of constraints. Hence, HypcrnctworkML is capable 
to represent a complete protein hypernetwork while maintaining 
compatibility with known standards. 

55.2 Resource Consumption 

A complex prediction on the defined yeast hypernetwork takes 12 
seconds using all 4 cores of an Intel® Core™ i5 CPU with 2.8GHz. 
The prediction of the PIS of 4579 single protein perturbations takes 
8 seconds. In both cases the software uses approximately 750MB 
RAM during prediction. 
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